Run-Length Encoded Nondeterministic KMP and Suffix Automata

نویسنده

  • Emanuele Giaquinta
چکیده

We present a novel bit-parallel representation, based on the run-length encoding, of the nondeterministic KMP and suffix automata for a string P with at least two distinct symbols. Our method is targeted to the case of long strings over small alphabets and complements the method of Cantone et al. (2012), which is effective for long strings over large alphabets. Our encoding requires O((σ+m)⌈ρ/w⌉) space and allows one to simulate the automata on a string in time O(⌈ρ/w⌉) per transition, where σ is the alphabet size, m is the length of P , ρ is the length of the run-length encoding of P and w is the machine word size in bits. The input string can be given in either unencoded or run-length encoded form.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Compact Representation of Nondeterministic (Suffix) Automata for the Bit-Parallel Approach

Article history: Available online 2 February 2012 We present a novel technique, suitable for bit-parallelism, for representing both the nondeterministic automaton and the nondeterministic suffix automaton of a given string in a more compact way. Our approach is based on a particular factorization of strings which on the average allows to pack in a machine word of w bits automata state configura...

متن کامل

Practical Algorithmic Techniques for Several String Processing Problems

The domains of data mining and knowledge discovery make use of large amounts of textual data, which need to be handled efficiently. Specific problems, like finding the maximum weight ordered common subset of a set of ordered sets or searching for specific patterns within texts, occur frequently in this context. In this paper we present several novel and practical algorithmic techniques for proc...

متن کامل

Approximate Determinization of Quantitative Automata

Quantitative automata are nondeterministic finite automata with edge weights. They value a run by some function from the sequence of visited weights to the reals, and value a word by its minimal/maximal run. They generalize boolean automata, and have gained much attention in recent years. Unfortunately, important automaton classes, such as sum, discounted-sum, and limit-average automata, cannot...

متن کامل

Skriptum VL Text Indexing

In this section we will introduce suffix trees, which, among many other things, can be used to solve the string matching task (find pattern P of length m in a text T of length n in O(n + m) time). We already know that other methods (Boyer-Moore, e.g.) solve this task in the same time. So why do we need suffix trees? The advantage of suffix trees over the other string-matching algorithms (Boyer-...

متن کامل

Deletion Operations on Deterministic Families of Automata

Many different deletion operations are investigated applied to languages accepted by one-way and twoway deterministic reversal-bounded multicounter machines, deterministic pushdown automata, and finite automata. Operations studied include the prefix, suffix, infix and outfix operations, as well as left and right quotient with languages from different families. It is often expected that language...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015